Linguistic Processing Pipelines: Problems and Solutions

نویسنده

  • Graham Wilcock
چکیده

Many of the typical tasks in linguistic processing pipelines can be done with tools from OpenNLP (http://opennlp.sourceforge.net). There are OpenNLP components for sentence detection, tokenization, part-of-speech tagging, phrase chunking, syntactic parsing, named entity recognition, and coreference resolution. These tools have been widely used for several years, and form a good basis for illustrating issues in configuring linguistic processing pipelines. The paper compares several different approaches to creating pipelines based on the tools. OpenNLP tools use maximum entropy-based statistical language models. Recent versions of OpenNLP provide ready-made models for several languages, including German.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing a structured linguistic play therapy program for reading disorder: Basics and Strategies

Background & Purpose: Linguistic play therapy is a structured intervention based on the linguistic core of reading that can be modified and implemented for students with reading problems and disorders. The purpose of this study is to provide theoretical foundations and solutions and principles of linguistic game therapy design to empower teachers and counselors related to educational service...

متن کامل

Standardisation and Interoperation of Morphosyntactic and Syntactic Annotation Tools for Spanish and their Annotations

Linguistic annotation tools and linguistic annotations are scarcely syntactically and/or semantically interoperable. Their low interoperability usually results from the number of factors taken into account in their development and design. These include (i) the type of phenomena annotated (either morphosyntactic, syntactic, semantic, etc.); (ii) how these phenomena are annotated (e.g., the parti...

متن کامل

On the notion of a Language Products Technology

The main task of language products technology is to find solutions to technical problems in all areas of natural language processing, and to implement them in an optimal way given particular linguistic, computational, and fincancial constraints. The key factors that play a role in the most efficient design solutions of language products comprise the aspects software engineering, software ergono...

متن کامل

Shallow Processing of Portuguese: from Sentence Chunking to Nominal Lemmatization

This dissertation proposes a set of procedures for the computational processing of Portuguese. Five tasks are covered: Sentence Segmentation, Tokenization, Partof-Speech Tagging, Nominal Featurization and Nominal Lemmatization. These are some of the initial steps producing linguistic information — such as POS categories or lemmas — that is important to most subsequent processing (e.g. syntactic...

متن کامل

The proper place of men and machines in language technology Processing Russian without any linguistic knowledge

The paper describes several experiments aimed at designing tools for processing Russian texts, namely for Part-Of-Speech tagging, lemmatisation and syntactic parsing, exploiting exclusively statistical approaches without coding any linguistic rules specifically for Russian. While not claiming any new ground for machine learning research, the results demonstrate the possibility to create state-o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009